AITopics | temporal classification

ACharacter-LevelLength-ControlAlgorithmfor Non-AutoregressiveSentenceSummarization

Neural Information Processing SystemsFeb-11-2026, 15:11:18 GMT

Sentence summarization aims at compressing a long sentence into a short one that conveys the main idea of the input.

artificial intelligence, machine learning, truncate, (16 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Decoding the Past: Explainable Machine Learning Models for Dating Historical Texts

Pinto, Paulo J. N., Pinho, Armando J., Pratas, Diogo

arXiv.org Artificial IntelligenceDec-1-2025

Accurately dating historical texts is essential for organizing and interpreting cultural heritage collections. This article addresses temporal text classification using interpretable, feature-engineered tree-based machine learning models. We integrate five feature categories - compression-based, lexical structure, readability, neologism detection, and distance features - to predict the temporal origin of English texts spanning five centuries. Comparative analysis shows that these feature domains provide complementary temporal signals, with combined models outperforming any individual feature set. On a large-scale corpus, we achieve 76.7% accuracy for century-scale prediction and 26.1% for decade-scale classification, substantially above random baselines (20% and 2.3%). Under relaxed temporal precision, performance increases to 96.0% top-2 accuracy for centuries and 85.8% top-10 accuracy for decades. The final model exhibits strong ranking capabilities with AUCROC up to 94.8% and AUPRC up to 83.3%, and maintains controlled errors with mean absolute deviations of 27 years and 30 years, respectively. For authentication-style tasks, binary models around key thresholds (e.g., 1850-1900) reach 85-98% accuracy. Feature importance analysis identifies distance features and lexical structure as most informative, with compression-based features providing complementary signals. SHAP explainability reveals systematic linguistic evolution patterns, with the 19th century emerging as a pivot point across feature domains. Cross-dataset evaluation on Project Gutenberg highlights domain adaptation challenges, with accuracy dropping by 26.4 percentage points, yet the computational efficiency and interpretability of tree-based models still offer a scalable, explainable alternative to neural architectures.

classification, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.23056

Country:

Europe (1.00)
North America > Mexico (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Graph Connectionist Temporal Classification for Phoneme Recognition

Grafé, Henry, Van hamme, Hugo

arXiv.org Artificial IntelligenceSep-9-2025

Automatic Phoneme Recognition (APR) systems are often trained using pseudo phoneme-level annotations generated from text through Grapheme-to-Phoneme (G2P) systems. These G2P systems frequently output multiple possible pronunciations per word, but the standard Connectionist Temporal Classification (CTC) loss cannot account for such ambiguity during training. In this work, we adapt Graph Temporal Classification (GTC) to the APR setting. GTC enables training from a graph of alternative phoneme sequences, allowing the model to consider multiple pronunciations per word as valid supervision. Our experiments on English and Dutch data sets show that incorporating multiple pronunciations per word into the training loss consistently improves phoneme error rates compared to a baseline trained with CTC. These results suggest that integrating pronunciation variation into the loss function is a promising strategy for training APR systems from noisy G2P-based supervision.

artificial intelligence, machine learning, pronunciation, (17 more...)

arXiv.org Artificial Intelligence

2509.05399

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

57587d8d6a7ede0e5302fc22d0878c53-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 23:16:37 GMT

label graph, recognition, sequence, (13 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Tobler's First Law in GeoAI: A Spatially Explicit Deep Learning Model for Terrain Feature Detection Under Weak Supervision

Li, Wenwen, Hsu, Chia-Yu, Hu, Maosheng

arXiv.org Artificial IntelligenceAug-7-2025

Recent interest in geospatial artificial intelligence (GeoAI) has fostered a wide range of applications using artificial intelligence (AI), especially deep learning, for geospatial problem solving. However, major challenges such as a lack of training data and the neglect of spatial principles and spatial effects in AI model design remain, significantly hindering the in-depth integration of AI with geospatial research. This paper reports our work in developing a deep learning model that enables object detection, particularly of natural features, in a weakly supervised manner. Our work makes three contributions: First, we present a method of object detection using only weak labels. This is achieved by developing a spatially explicit model based on Tobler's first law of geography. Second, we incorporate attention maps into the object detection pipeline and develop a multistage training strategy to improve performance. Third, we apply this model to detect impact craters on Mars, a task that previously required extensive manual effort. The model generalizes to both natural and human-made features on the surfaces of Earth and other planets. This research advances the theoretical and methodological foundations of GeoAI.

artificial intelligence, machine learning, proposal, (19 more...)

arXiv.org Artificial Intelligence

2508.03745

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Star Temporal Classification: Sequence Modeling with Partially Labeled Data

Neural Information Processing SystemsOct-11-2024, 03:22:48 GMT

We develop an algorithm which can learn from partially labeled and unsegmented sequential data. Most sequential loss functions, such as Connectionist Temporal Classification (CTC), break down when many labels are missing. We address this problem with Star Temporal Classification (STC) which uses a special star token to allow alignments which include all possible tokens whenever a token could be missing. We express STC as the composition of weighted finite-state transducers (WFSTs) and use GTN (a framework for automatic differentiation with WFSTs) to compute gradients. We perform extensive experiments on automatic speech recognition.

sequence modeling, star temporal classification, temporal classification, (5 more...)

Neural Information Processing Systems

Genre: Play > Prospect (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.76)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.66)

Add feedback

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Gao, Dongji, Wiesner, Matthew, Xu, Hainan, Garcia, Leibny Paola, Povey, Daniel, Khudanpur, Sanjeev

arXiv.org Artificial IntelligenceJun-1-2023

This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in humanannotated (a) Deletion (partial transcript) (b) Substitution speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) criterion. BTC explicitly encodes the uncertainties associated with transcripts during (c) Insertion (d) Substitution and insertion training. This is accomplished by enhancing the flexibility of the training graph, which is implemented as a weighted finitestate Figure 1: Examples of error in the transcript. The grey box is transducer (WFST) composition. The proposed algorithm the exact text and the red box is the imperfect text. Inaccurate improves the robustness and accuracy of ASR systems, particularly words are marked in bold.

artificial intelligence, machine learning, transcript, (14 more...)

arXiv.org Artificial Intelligence

2306.01031

Country:

North America > United States (0.14)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Powerful and Extensible WFST Framework for RNN-Transducer Losses

Laptev, Aleksandr, Bataev, Vladimir, Gitman, Igor, Ginsburg, Boris

arXiv.org Artificial IntelligenceMar-18-2023

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug. WFSTs are easy to construct and extend, and allow debugging through visualization. We introduce two WFST-powered RNN-T implementations: (1) "Compose-Transducer", based on a composition of the WFST graphs from acoustic and textual schema -- computationally competitive and easy to modify; (2) "Grid-Transducer", which constructs the lattice directly for further computations -- most compact, and computationally efficient. We illustrate the ease of extensibility through introduction of a new W-Transducer loss -- the adaptation of the Connectionist Temporal Classification with Wild Cards. W-Transducer (W-RNNT) consistently outperforms the standard RNN-T in a weakly-supervised data setup with missing parts of transcriptions at the beginning and end of utterances. All RNN-T losses are implemented with the k2 framework and are available in the NeMo toolkit.

artificial intelligence, implementation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10096679

2303.10384

Country:

North America > United States > Rhode Island (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Star Temporal Classification: Sequence Classification with Partially Labeled Data

Pratap, Vineel, Hannun, Awni, Synnaeve, Gabriel, Collobert, Ronan

arXiv.org Machine LearningJan-28-2022

We develop an algorithm which can learn from partially labeled and unsegmented sequential data. Most sequential loss functions, such as Connectionist Temporal Classification (CTC), break down when many labels are missing. We address this problem with Star Temporal Classification (STC) which uses a special star token to allow alignments which include all possible tokens whenever a token could be missing. We express STC as the composition of weighted finite-state transducers (WFSTs) and use GTN (a framework for automatic differentiation with WFSTs) to compute gradients. We perform extensive experiments on automatic speech recognition. These experiments show that STC can recover most of the performance of supervised baseline when up to 70% of the labels are missing. We also perform experiments in handwriting recognition to show that our method easily applies to other sequence classification tasks.

classification, recognition, sequence classification, (14 more...)

arXiv.org Machine Learning

2201.12208

Country: Asia > China (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Parcollet, Titouan, Zhang, Ying, Morchid, Mohamed, Trabelsi, Chiheb, Linarès, Georges, De Mori, Renato, Bengio, Yoshua

arXiv.org Machine LearningJun-20-2018

Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

1806.07789

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States (0.14)
Europe > France (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Filters

Collaborating Authors

temporal classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ACharacter-LevelLength-ControlAlgorithmfor Non-AutoregressiveSentenceSummarization

Decoding the Past: Explainable Machine Learning Models for Dating Historical Texts

Graph Connectionist Temporal Classification for Phoneme Recognition

57587d8d6a7ede0e5302fc22d0878c53-Paper-Conference.pdf

Tobler's First Law in GeoAI: A Spatially Explicit Deep Learning Model for Terrain Feature Detection Under Weak Supervision

Star Temporal Classification: Sequence Modeling with Partially Labeled Data

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Powerful and Extensible WFST Framework for RNN-Transducer Losses

Star Temporal Classification: Sequence Classification with Partially Labeled Data

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition